Bayes Optimal Feature Selection for Supervised Learning with General Performance Measures

نویسندگان

  • C. G. Saneem Ahmed
  • Harikrishna Narasimhan
  • Shivani Agarwal
چکیده

The problem of feature selection is critical in several areas of machine learning and data analysis. Here we consider feature selection for supervised learning problems, where one wishes to select a small set of features that facilitate learning a good prediction model in the reduced feature space. Our interest is primarily in filter methods that select features independently of the learning algorithm to be used and are generally faster to implement than wrapper methods. Many common filter methods for feature selection make use of mutual information based criteria to guide their search process. However, even in simple binary classification problems, mutual information based methods do not always select the best set of features in terms of the Bayes error. In this paper, we develop a filter method that directly aims to select the optimal set of features for a general performance measure of interest. Our approach uses the Bayes error with respect to the given performance measure as the criterion for feature selection and applies a greedy algorithm to optimize this criterion. We demonstrate application of this method to a variety of learning problems involving different performance measures. Experiments suggest the proposed approach is competitive with several state-of-the-art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ghodke, Sumukh and Timothy Baldwin (2007) An Investigation into the Interaction between Feature Selection and Discretization: Learning How and When to Read Numbers, In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI07), Gold Coast, Australia, pp. 48-57

Pre-processing is an important part of machine learning, and has been shown to significantly improve the performance of classifiers. In this paper, we take a selection of pre-processing methods—focusing specifically on discretization and feature selection—and empirically examine their combined effect on classifier performance. In our experiments, we take 11 standard datasets and a selection of ...

متن کامل

The Impact of Feature Selection on the Accuracy of Bayes Classifier

In this paper is presented the impact of feature selection on the accuracy of Bayes classifier. Six feature selection techniques have been used for feature selection, evaluated and compared using supervised learning algorithm on eight real and three artificial benchmark data. Accuracy of the classifier is influenced by the choice of feature selection techniques. In our experiment, One-R improve...

متن کامل

Improved Genetic Algorithm Based Feature Selection Strategy Based Five Layered Artificial Neural Network Classifier (Iga – Flann)

Data classification is one of the investment research areas in the field of data mining. Machine learning algorithms such as naive bayes, neural network, and support vector machine are most regularly used for performing the classification task. Supervised learning is one of its kinds where the datasets consist of class labels and the machine learning classifier are trained first using that. It ...

متن کامل

A Comparative Study on Different Types of Approaches to Bengali document Categorization

Learning. Abstract: Document categorization is a technique where the category of a document is determined. In this paper three well-known supervised learning techniques which are Support Vector Machine(SVM), Naïve Bayes(NB) and Stochastic Gradient Descent(SGD) compared for Bengali document categorization. Besides classifier, classification also depends on how feature is selected from dataset. F...

متن کامل

Infomation based supervised and semi-supervised feature selection

We merge the results from both of supervised and semi-supervised feature selection techniques. The method was applied to the five datasets from NIPS feature selection competition. As a preprocessing step, we firstly discretize each training dataset using EM algorithm. Then, we filter the discretized dataset based on the MI (mutual information) value of each feature with respect to the class var...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015